In this scenario, a non-Profit has
data on several states’ homeless shelters and would like to know where
to direct their limited resources for the most impact. Data comes from
Kaggle
We begin by summarizing
the data and determining if it’s missing any values. We also see how
many states we are looking at.
rawData <- read.csv("homelessness_shelter_data.csv")
summary(rawData)
## id date shelter_name city
## Min. : 1.0 Length:1000 Length:1000 Length:1000
## 1st Qu.: 250.8 Class :character Class :character Class :character
## Median : 500.5 Mode :character Mode :character Mode :character
## Mean : 500.5
## 3rd Qu.: 750.2
## Max. :1000.0
## state total_capacity occupied_beds available_beds
## Length:1000 Min. : 50.0 Min. : 0.00 Min. : 0.00
## Class :character 1st Qu.:115.0 1st Qu.: 38.00 1st Qu.: 35.00
## Mode :character Median :182.0 Median : 76.00 Median : 71.00
## Mean :179.1 Mean : 91.87 Mean : 87.25
## 3rd Qu.:243.0 3rd Qu.:136.00 3rd Qu.:128.25
## Max. :300.0 Max. :294.00 Max. :296.00
## occupancy_rate average_age male_percentage female_percentage
## Min. : 0.00 Min. :18.00 Min. :40.00 Min. :30.00
## 1st Qu.: 26.77 1st Qu.:30.00 1st Qu.:47.00 1st Qu.:38.00
## Median : 51.85 Median :42.00 Median :55.00 Median :45.00
## Mean : 51.21 Mean :42.04 Mean :54.63 Mean :45.37
## 3rd Qu.: 76.80 3rd Qu.:54.00 3rd Qu.:62.00 3rd Qu.:53.00
## Max. :100.00 Max. :65.00 Max. :70.00 Max. :60.00
## season notes
## Length:1000 Length:1000
## Class :character Class :character
## Mode :character Mode :character
##
##
##
length(unique(rawData$state))
## [1] 6
unique(rawData$state)
## [1] "TX" "CA" "AZ" "IL" "NY" "PA"
We see that Chicago, San
Jose, and New York have low availability, and generally high occupancy.
That lets us narrow our scope a bit more and we can see how the
occupancy rate has changed over time for these shelters
It seems the beginning of
2025 saw an increase in all three cities. The data is quite variable
month-to-month so it’s hard to see patterns, but this doesn’t seem to
happen at the beginning of 2024. Of all three cities, Chicago has seen
double the average occupancy rate since this time 2023. We’ll next dive
into Chicago’s shelters specifically
Let’s see how Chicago
shelter demands change by season
rawData %>%
filter(city=='Chicago') %>%
group_by(shelter_name,date) %>%
mutate(occupancy_rate = mean(occupancy_rate)) %>%
distinct(shelter_name, date, .keep_all = TRUE) %>%
group_by(shelter_name) %>%
mutate(date = as.Date(date),
firstOR = occupancy_rate[date==min(date)],
lastOR = occupancy_rate[date==max(date)],
change = ((lastOR - firstOR)/firstOR)*100
) %>%
distinct(shelter_name, change) %>%
arrange(desc(change))
## # A tibble: 10 × 2
## # Groups: shelter_name [10]
## shelter_name change
## <chr> <dbl>
## 1 Safe Haven 1626.
## 2 Sunrise Shelter 218.
## 3 Shelter Plus 128.
## 4 Recovery Residence 58.7
## 5 Second Chance 25.3
## 6 Harbor Home 10.1
## 7 Pathway Place -11.9
## 8 New Beginnings -16.3
## 9 HomeSafe -60.4
## 10 Hope House -65.6